-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154
base: master
Are you sure you want to change the base?
NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154
Conversation
e38b474
to
7d88863
Compare
@TolyaTalamanov please review |
7d88863
to
b52da47
Compare
src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp
Outdated
Show resolved
Hide resolved
* Tell NPUW the configuration for compilation of prefill model. | ||
* NOTE: !! Write-only !! | ||
*/ | ||
static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why do we even need the Property
for this?
They idea is that user may provide it like this:
model = read_model(...);
auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });
Note, there is no need for user to set
or get
this config later on. It just should be passed once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just for us that all things are in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just for us that all things are in one place
TBH, didn't get the point. What are the things
and why there should be in one place?
My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG
, ...) as properties
complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap
, it's parsed once in llm_compiled_model.cpp
and then forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All properties that are passed into core.compile_model(model, device, properties)
should be handled in one place.
It is passed as properties
into core.compile_model(...)
, not as any extra
parameters or some custom thing. That means we should treat this as property.
@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) { | |||
|
|||
ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const { | |||
OPENVINO_SUPPRESS_DEPRECATED_START | |||
if (name == ov::intel_npu::npuw::llm::prefill_config.name() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it's really needed, see comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_property()
might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keys provided to LLM
pipeline must not be properties
, so there won't be any mess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what they should be in this situation? And how they should be passed into core.compile_model()
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile_model
accept ov::AnyMap
, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html
GenAI pipeline does it exactly this way
@TolyaTalamanov have you finished with review, should this be merged? @AsyaPronina there are merge conflicts |
* Tell NPUW the configuration for compilation of prefill model. | ||
* NOTE: !! Write-only !! | ||
*/ | ||
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPUW_LLM_PREFILL_CONFIG
and NPUW_LLM_GENERATE_CONFIG
are supposed to be passed to compile(...)
once and then can be forgotten. Why do we need to define properties for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we pass them as properties
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the exact question. My point was not to make it as property
.
It will only simplify implementation, don't see use case where user would like to query these properties
for read/write, is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed internally
@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_ | |||
*/ | |||
static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I'd not make it as property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed internally
// preserve them somewhere. | ||
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG")); | ||
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG")); | ||
|
||
m_cfg.update(any_copy(npuw_llm_props)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe nothing from npuw_llm_props
should get into m_cfg
, right?
Everything related to LLM pipeline can be extracted here and then forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, however how would you check and test what have you passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same way as it's done in GenAI. You can put any checks within this file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Discussed internally
@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) { | |||
|
|||
ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const { | |||
OPENVINO_SUPPRESS_DEPRECATED_START | |||
if (name == ov::intel_npu::npuw::llm::prefill_config.name() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keys provided to LLM
pipeline must not be properties
, so there won't be any mess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propose not to make LLMCompiledModel
config options as property
...
The rest of implementation LGTM 👍
* Tell NPUW the configuration for compilation of prefill model. | ||
* NOTE: !! Write-only !! | ||
*/ | ||
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the exact question. My point was not to make it as property
.
It will only simplify implementation, don't see use case where user would like to query these properties
for read/write, is there?
// preserve them somewhere. | ||
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG")); | ||
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG")); | ||
|
||
m_cfg.update(any_copy(npuw_llm_props)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same way as it's done in GenAI. You can put any checks within this file...
@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) { | |||
|
|||
ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const { | |||
OPENVINO_SUPPRESS_DEPRECATED_START | |||
if (name == ov::intel_npu::npuw::llm::prefill_config.name() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile_model
accept ov::AnyMap
, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html
GenAI pipeline does it exactly this way
a263f2c
to
5a81b3c
Compare
5a81b3c
to
0c0b36a
Compare
Details:
NPUW_LLM_PREFILL_CONFIG
andNPUW_LLM_GENERATE_CONFIG
optionsNPUW_LLM_PAD_TOKEN_ID
Tickets:
Related PRs: